Ajjit Narayanan, Fay Walker, Aaron R. Williams
What is 2 + 2?
What is 2 + 2?
## [1] 4
What is the median diamond price with carat > 1 and a “Good” cut?
What is the median price of diamonds with carat > 1 and a Good cut?
## # A tibble: 1 x 1
## `median(price)`
## <int>
## 1 6412
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
Deliberate steps should be taken to minimize the chance of making an error and maximize the chance of catching errors when errors inevitably occur.
Computational reproducibility should be embraced to improve accuracy, promote transparency, and prove the quality of analytic work.
Replication: the recreation of findings across repeated studies, is a cornerstone of science
Reproducibility: the ability to access data, source code, tools, and documentation and recreate all calculations, visualizations, and artifacts of an analysis
Computational reproducibility should be the minimum standard for computational social sciences and statistical programming
Code should be written so humans can easily understand what’s happening—even if it occasionally sacrifices machine performance.
Analyses should be designed so strangers can understand each and every step without additional instruction or inquiry from the original analyst.
Research and data are non-rivalrous and can be non-excludable. They are public goods that should be widely and easily shared. Decisions about tools, methods, data, and language during the research process should be made in ways that promote the ability of anyone and everyone to access an analysis.
Analysts should seek to make all parts of the research process more efficient with clear communication, by adopting best practices, and by managing computation.
.R and .Rmd“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” ~ Hadley Wickham
Collections of R, C, C++, and FORTRAN code that expand the functionality of R.
Comprehensive set of tools for data science
Core: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats
Free text by Hadley Wickham and Garrett Grolemund
Scalars (do not exist in R)
Vectors
## [1] 1 2 3 4 5
Matrices
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Data frames, multidimensional arrays
## # A tibble: 4 x 4
## name awake brainwt bodywt
## <chr> <dbl> <dbl> <dbl>
## 1 Cheetah 11.9 NA 50
## 2 Owl monkey 7 0.0155 0.48
## 3 Mountain beaver 9.6 NA 1.35
## 4 Greater short-tailed shrew 9.1 0.00029 0.019
Character
## [1] "a" "b" "c" "d" "e"
Numeric
## [1] 1 2 3 4 5
Logical
## [1] TRUE TRUE FALSE TRUE FALSE
Factor
## [1] good ok bad ok ok
## Levels: good ok bad
NA is R’s encoding for missing values## [1] NA
R can hold many different objects at the same time. Storing the consequence of code requires assignment (<-).
## [1] 4
## [1] 4
Arguments by position
## [1] 2.5
Arguments by name
## [1] 2.5
Function documentation
?mean
Rule of three: never program something three or more times
test_oddness <- function(x) {
ifelse(test = x %% 2 == 0, yes = "even!", no = "odd!")
}
test_oddness(1:10)## [1] "odd!" "even!" "odd!" "even!" "odd!" "even!" "odd!" "even!" "odd!"
## [10] "even!"
What will it take to convince you that your code is correct?
data/, scripts/, and outputs// regardless of operating system.setwd() to shortcut much of absolute file paths.Rproj are a superior solution only available in R
setwd() in RsessionInfo()install.packages("tidyverse") to the consolelibrary(tidyverse)Photo by StataCorp LP, CC BY-SA 4.0, Unaltered
Source is unknown
Comments